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Abstract 

We propose a new numerical method for the computation of the optimal 
value function of perturbed control systems and associated globally stabilizing 
optimal feedback controllers. The method is based on a set oriented discretiza- 
tion of state space in combination with a new algorithm for the computation 
of shortest paths in weighted directed hypergraphs. Using the concept of a 
multivalued game, we prove convergence of the scheme as the discretization 
parameter goes to zero. 

Key Words: optimal control, dynamic game, set oriented numerics, graph 
theory 

1 Introduction 

Global infinite horizon optimal control methods for the solution of general nonlinear 
stabilization problems are attractive for their flexibility and theoretical properties, 
because they are applicable to virtually all types of nonlinear dynamics, their op- 
timal value functions can typically be identified as Lyapunov functions and they 
allow for a rigorous treatment of perturbations in a game theoretical setting. How- 
ever, these methods have the drawback that their numerical solution requires the 
discretization of the state space which results in huge numerical problems both in 
terms of computational cost and in terms of memory requirements. Hence, in order 
to make these methods applicable to a broader range of systems, advanced numer- 
ical techniques are needed in order to reduce the computational effort as much as 
possible. 

A novel approach to such problems was presented in the recent paper [1] , where 
a set oriented numerical method for the approximate computation of the optimal 
value function of certain nonlinear optimal control problems has been developed. 
The approach relies on a division of state space into boxes that constitute the nodes 
of a directed weighted graph, where the weights are constructed from the given 
cost function. On this graph, standard graph theoretic algorithms for computing 
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shortest paths can directly be applied, yielding an approximate value function which 
is piecewise constant on the state space. At the same time, for every node in the 
graph, these algorithms compute the successor node on a shortest path, yielding 
approximate optimal pseudo-trajectories of the original system. Hence, this method 
combines a simple and hierarchically implementable discretization technique with 
efficient graph theoretic algorithms yielding both low memory consumption and 
a fast solution. For the problem of feedback stabilization the solution from [I], 
however, is not directly applicable, because the resulting pseudo-trajectories would 
have to be postprocessed in order to obtain true solutions of the system. 

In [2] it was subsequently shown that the approximate optimal value function 
can in fact be used in order to construct a stabilizing feedback controller. Based on 
concepts from dynamic programming [3] and Lyapunov based approximate stability 
analysis [I], a statement about its optimality properties was given and a local a 
posteriori error estimate derived that enables an adaptive construction of the division 
of state space. However, due to the fact that the approximate optimal value function 
is not continuous, the constructed feedback law is in general not robust with respect 
to perturbations of the system. 

In the present paper, we show how to incorporate arbitrary perturbations into 
the framework sketched above. These perturbations can be either inherently con- 
tained in the underlying model, describing, e.g., external disturbances or the effect 
of unmodelled dynamics, or they could be added on top of the original model to 
account, e.g., for discretization errors. 

Our goal in this paper is to construct a feedback which is robust in the sense 
that on a certain subset of state space it stabilizes the system regardless on how the 
perturbation acts. Conceptually, this problem leads to a dynamic game, where the 
controls and the perturbations are associated to two "players" that try to minimize 
and to maximize a given cost functional, respectively. We show how the discretiza- 
tion of state space in a natural way leads to a multivalued dynamic game (i.e. a 
discrete inclusion) and prove convergence of the associated value function when the 
images of the inclusion shrink to the original single- valued map. From this multival- 
ued game we derive a directed weighted hypergraph that gives a finite state model 
of the original game. We formulate an adapted version of Dijsktra's algorithm in 
order to compute the associated approximate value function and prove convergence 
when the box-diameter of the state space division goes to zero. 

It should be noted that the convergence analysis developed in this paper using 
multivalued dynamics is new also for the discretization of optimal control problems 
without perturbations in pp. An interesting side result of our study is that using 
this technique we are able to keep track of the effects of discontinuities in the ap- 
proximated optimal value function as induced, e.g., by state space constraints. This 
allows us to prove not only L°° convergence in regions of continuity but also L 1 
convergence in the whole domain of the optimal value function, provided that the 
optimal value function is continuous with respect to small changes in the state space 
constraints. 

Compared to other dynamic programming approaches to the stabilization of 
perturbed nonlinear systems (see, e.g., [5] and the references therein), the main 
advantages of our method are these general and rigorously provable convergence 
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properties and the low computational cost of our perturbed version of Dijkstra's 



algorithm, cf. Section 6.1 However, our new algorithm is also advantageous for un- 



perturbed problems when treating the spatial discretization errors as perturbation: 



as Example (19) illustrates, this approach leads to considerably improved perfor- 
mance on a significantly coarser discretization compared to [2]. 

The paper is organized as follows. In the ensuing Section [2] we describe the 
problem formulation and the associated game theoretic interpretation. In Section [3] 
we introduce the concept of a multivalued game and an enclosure and prove a state- 
ment about the convergence of the value function of a sequence of enclosures of a 
multivalued game. These result are extended to systems with state constraints in 
Section |4| In Section [5] we show how via the division of state space one obtains a 
multivalued game from the original system, construct the corresponding hypergraph 
and introduce an associated shortest path algorithm. Some hints on its implementa- 
tion, complexity issues as well as two numerical examples are addresed in Section [6] 
Convergence of the numerical approximation to the optimal value function and the 
construction of approximately optimal feedback laws are discussed in Sections [7] and 
[HJ respectively. 

2 Problem formulation 

We consider the problem of optimally stabilizing the discrete-time perturbed control 
system 

x k+ i = f(x k ,u k , w k ), A; = 0,1,..., (1) 

where f: XxUxW — > X is continuous, x k G X is the state of the system, 
Uk G U is the control input and w k G W is a perturbation parameter, chosen from 
sets X C M. d , U C lR m and W C M. £ . In addition to the evolution law, we are given a 
continuous cost function g : X x U — > [0, oo), that assigns the cost g(x k ,u k ) to any 
transition x k+1 = f(x k ,u k ,w k ), w k G W. 

Our goal is to derive an (optimal) feedback law u : X — > U that stabilizes the 
system in the sense that for a certain subset S C X any trajectory starting in 
S tends to some prescribed set O C X, while the worst case accumulated cost is 
minimized. 

Let us be more precise. For a given initial point x G X, a control sequence 
u = ( u k)keN G U N and a perturbation sequence w = (w k ) k eN G W N yield the 
trajectory x(x, u, w) = (x k (x, u, w)) fceN , defined by x = x and 

x k+ i = f(x k (x,u,w),u k ,w k ), k = 0, 1,..., (2) 

while the associated accumulated cost is given by 



J(x,u,w) = ^g(x k (x,u,w),u k ). 



k=0 



In order to formalize the interplay between the control and the perturbation we 
employ a game theoretic viewpoint which we describe next. The problem formulation 
actually already describes a game (see, e.g., [6]), where at each step of the iteration 
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Q two "players" choose a control value Uk and a perturbation value Wk, respectively. 
The goal of the controlling player is to minimize J, while the perturbing player tries 
to maximize this quantity. 

We assume that the controlling player has to choose the value Uk first and that the 
perturbing player has the advantage of knowing when choosing the perturbation 
value Wk- However, the perturbing player is not able to forsee future choices of 
the controlling one. More formally, we restrict the choice of perturbation sequences 
w G W 9 to those that result from applying a nonanticipating strategy (3 : U N — > W N 
to a given control sequence u G £/ N , i.e. we have w = /3(u), with f3 satisfying 

u k = u' k Wk<K (3(u) k = I3(u') k Wk<K 

for any two control sequences u = u' = {u' k )k G U n . Let B denote the set of 

all nonanticipating strategies (3 : U N — > W N . 

As mentioned, our goal is to find a feedback law u : X —>■ U such that with 
controls Uk = u(xk), Xk approaches a given set O C X, regardless of how the 
perturbation sequence w is chosen. Accordingly, we assume that we know a compact 
robust forward invariant set O G X, i.e. for all x G O there is a control u G U such 
that f(x,u,W) C O. Since we are done with controlling the system once we are 
on O, we assume that g(x,u) = for all x G O and all u G U and g(x,u) > 
for all x ^ O and all u G U. Further assumptions on g and on the dynamics in a 
neighborhood of O will be specified later. 

Our construction of the feedback law will be based on the upper value function 
V : X -> [0,oo], 

V(x) = sup inf J(x, u, /9(u)), (3) 
of the game ([!]), which fulfills the optimality principle 



V(x) = inf 
ueu 



g{x,u) + sup V(f(x,u,w)) 



(4) 



3 Multivalued games 

As we will see in the next section, our set oriented approach to the discretization of 
state space of the perturbed control system ([!]) leads to a finite state multivalued 
system. For the convergence analysis of this discretization it turns out to be useful 
to introduce as an intermediate object an infinite state multivalued game defined by 
a discrete inclusion. This is given by a multivalued map 

F : X xU xW ^ X, 

where J C R d is a closed set and U C W 71 , W G 18^ and the images of F are compact 
sets, together with a cost function 

G : X x X x U x W -> [0,oo). 

In order to simplify our presentation we first assume that F(x, u,w) ^ for all 
xEX,uEU,wE W, which will be relaxed later, cf. Section [4} Further regularity 



4 



assumptions on these maps will be imposed when needed. Note that we have intro- 
duced a second state argument in G, which allows to associate different costs to the 
trajectories of the associated discrete inclusion. 

For a given initial state x G X, a given control sequence u = (u k ) keN G U n and 
a given perturbation sequence w = (w k ) ke ^ G W N , a trajectory of the game is given 
by any sequence x = (xk)ken G X N such that Xo = x and 

x k+1 G F(x k ,u k ,w k ), k = 0,1,2,.... 

We denote by 

X F (x,u,w) = {(x fe ) fc G X N | x = x,x k+1 G F(x k ,u k ,w k ) Vk G N} 

the set of all trajectories of F associated to x, u and w. The accumulated cost is 
given by 

oo 

J (F>G) (x,u, w) = inf S2G(x k ,x k+1 ,u k ,w k ). 

K = 

As in the previous section, we are interested in computing the upper value function 

Vf F ,G)(x) = sup inf Jr F>G) (x,u,P(u)), X G X, (5) 
/3es uec/ N 

of this game. By standard dynamic programming arguments [7j one sees that this 
function fulfills the optimality principle 

V( F ,G)(x) = inf sup inf {G(x, x x , u, w) + V( F ,g)(xi)} ■ (6) 

ueU W £\y xi£F(x,u,w) 

Observe that our original "single valued" game (|2]) (|3]) can be recast in this 
multivalued setting by defining 

F(x,u,w) := {f(x,u, w)} and G(x, x x , u, w) := g(x,u). 

We will now investigate the relation of the value functions of different multivalued 
games. For this purpose we first introduce the concept of an enclosure. 

Definition 1. If(Fi,Gi) and (i^G^) are two multivalued games such that 

F 2 (x,u,w) C Fi(x,u, w) 

for all x, u and w and 

Gi(x,x' ,u,w) < G 2 (x, x', u, w) 

for all x,x' G F 2 (x,u,w) and all u and w, then (F\,G\) is called an enclosure of 
(F 2 ,G 2 ). 

From this definition we immediately obtain the following proposition. 
Proposition 1. Let the game (Fi,Gi) be an enclosure of the game (F 2 ,G 2 ). Then 

V(F u Gi) < V(f 2 ,g 2 )- 
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The next proposition studies the convergence of the value functions V^Gi) of a 
sequence of games (Fi,Gt). In this proposition H denotes the Hausdorff distance 
for compact sets. 

Proposition 2. Let the sequence of games (Fi,Gi), i G N, be enclosures of the game 
(F, G) and assume 

sup H(Fi(x,u,w), F(x,u,w)) — ► as i — > oo (7) 

and 

sup |G f i(x, xi, u, w) — G(x, xx, u, w)\ — > as ? — > oo. (8) 

Assume furthermore that F is upper semi-continuous in x and that G is continuous 
in x and x\, both uniformly in u and w and on compact subsets of X. In addition, 
we assume that there exists a G /CocQ with 

G(x, xx, u, w) > a(d(x, O) + d(xx, O)) 

and 

Gi(x, xx, u, w) > a(d(x, O) + d(xx, O)) 

for allieN,ueU,wE W , and that V(f,g) is continuous on 30. Then for each 
compact set K C X for which sup^g^ V(f,g){x) < oo we have 

sup |V(ir i)Gi )(^) - V(f,G)(x)\ ^ as % > oo, 
xeK 

i.e., uniform convergence on compact sets in the domain ofV(p,G)- 

Proof. Let k* : X N — > N be a bounded map. Then from the optimality principle (|6 
we obtain by induction 

f fc*(x)-l 

V(f,g){x) = sup inf inf <^ V" G(x k ,x k+1 ,u k , (3{u) k ) 

Now let 7 := sup^g^ V(f,g)(x)- Due to the lower bound a on G, for every 5 > 
there exists a time k 7y s £ N such that for each trajectory x G Xp(x, u, /3(u)) with 
cost bounded by 7 there exists a time fc*(x) < fc 7i( j such that x k *^ G Bs(0). We fix 
e > and x E K and choose 5 > such that Vif,G){ x ) < £ f° r all x £ Bs(0) (5 exists 
because of the continuity of Vn?G) on 90). Then, using an e-optimal perturbation 
strategy (3* G B and an arbitrary u* G £/ N , from the above optimality principle we 



1 A function 7 : [Q, 00) — > [0,oo) is of class K, if it is continuous, zero at zero and strictly 
increasing. It is of class /Coo, if, in addition, it is unbounded. 
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obtain 



V( F ,g)(x) < inf 



inf 



u£C/ N x£^ F (i,u,/3*(u)) 



fc*(x)-l 

, Xk+i, Wfe, (3*(u) 



k=0 



< inf inf 

ueU N x&X F (x,u,f3*(u)) 



+V(F,G)(^fc*(x)) f 

'fc*(x)-l 

^ G(£ fc ,x fc+ i,u fe)/ 3*(u) fe ) f + 2e 

k=0 



< inf 

x6,*>0,u*„3*(u*)) 



fc*(x)-l 

^2 G(x k ,x k+ i,u* k ,P*(u*) k ) ^ +2e. 

k=0 



Now, fixing /?*, for any ieNwe can pick an e-optimal control u*, yielding 

7 > 0) 

{oo 
VGi(x fc ,a; fc+ i,«) fc ,/3*(uJ)i 

( fc*(x) 'I 

- v M a , ^{Y^ Gi(x k ,x k+1 ,(u*) k ,P*(u*) k ) > -e. 

In particular, this last expression is bounded by 7 and hence the lower bound a for 
Gi implies that there exists a compact set K\ such that each e-optimal trajectory 
(x k ) k G X Fi (x,u*,(3*(u*)) lies in K x for all i G N. 

Now assumption ([7]) and the upper semicontinuity of F imply that for each 
E\ > there exists an i G N such that for i > i and each such e-optimal trajectory 
(x k ) k G ^(rc, u|,/3*(u*)) there exists a trajectory (or*.)* G u*, /3*(u*)) with 

ll^fc — < £ i f° r all A; = 1, ... , fc 7i 5. Hence rt8j) and the continuity of G imply that 
we can find i\ G N such that 



inf , <^ VG(x fc ,x fc+ i,(u*)fc,/9*(un 

(^)fteA" F (z,u*,/3*(u*)) 



v i / lf *,, \ ^Gi(x k ,x k+1 ,(u*) k ,(3*(u*) k ) 



< e 



for all i > ii and all A;* G {1, . . . , fc 7) <5}. Combining this inequality with the estimates 
for V(f,g) an d V(F it Gi) using u* = u* in the former we obtain 

for all i > zi. Since z'i depends only on fe 7) ^ and e, hence only on the set K and not 
on the individual x, we obtain the desired uniform convergence. □ 
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Remark 1. Note that we have obtained our result under very weak assumptions on 
F and G using, however, the crucial continuity assumption ofVip,G) on 90. This 
assumption — which is implicit and in general difficult to check directly — can be 
ensured by the following asymptotic controllability assumption on the dynamics F 
and the cost function G in a neighborhood of O: 

Assume that there exists a neighborhood M of O and a ICC function^r) such that 
for each x G Af and each perturbation strategy [3 G B there exists a control sequence 
u G U n and a trajectory (x k )k G Xp(x, u, /5(u)) with 

d{x k ,O)<r){d{x ,O),k). (9) 

Then, using the construction from JB, Proof of Theorem 5.4], we find a K, function 
p (denoted p 2 in W) suc ^ that G(x ,Xi,u,w) < p(d(x ,O)) for x Q G Af implies 

OO 

^ G(x k , x k+1 , u k , (3(u) k ) < a(d(x ,O)) 

k=0 

for some JC function a. Since a(d(x, O)) — > as d(x, O) — > this implies V(x) — > 
as d(x, O) — > which yields continuity ofV on dO. Note that condition ^ is weaker 
than controllability conditions typically employed to ensure continuity in minimum 
time problems or pursuit-evasion games (cf. e.g. [9, Chapter IV]) because we do not 
require to be able to steer the system into the "target" set O but only asymptotically 
to O. 

We also emphasize that we only need continuity at the boundary of O and that 
our optimal value function may be discontinuous elsewhere. 



4 State space constraints 

So far we have assumed F(x,u,w) ^ for all x G X, u G U, w G W which 
guarantees that for each initial value x, and each pair of control and perturbation 
sequences u and w we obtain at least one trajectory (x k ) k which is defined for all 
k G No- However, in practice it will often be necessary to relax this assumption. 

In order to motivate this relaxation, assume that we are given a multivalued 
game (F, G) on a state space X C M d . In our numerical approach, the state space 
set X on which we can solve the problem will be a compact set while the state 
space X of the given problem is often unbounded. In addition, from a modeling 
point of view it might be desirable to introduce state constraints, e.g., in order to 
avoid certain critical regions of the state space. In both cases, it will be necessary 
to restrict the state space of the original problem defining 

F(x, u, w) := F(x, u,w)nX, x G X, u G U, w G W. 

This construction may result in F(x,u,w) = for certain xGX, u & U, w & W 
and consequently it may happen that a solution trajectory will only exist for finite 

2 A function rj : [0, oo) x [0, oo) —> [0, oo) is of class ICC if it is continuous, of class JC in the first 
variable and strictly decreasing to in the second variable. 
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time. More precisely, for given F, given u = (uk)k e U N , given w = (u>k)k £ 
and any sequence x = (xk)k G let 

/c™ ax (x, u, w) = max |& G N : £fc+i G F(xk, Wfc, Wfc), fc = 0, . . . , — l| 

be the maximal index up to which the sequence x constitutes a trajectory of F. 
Since a trajectory with /c™ ax (x, u, w) < oo cannot converge to the set O we set 

J(f,g)(x, u ; w ) := oo if /c™ ax (x, u, w) < oo for each x G X N with x = Xq. 

It is easy to see that Proposition [I] remains valid in this case, while Proposition 
[2] is more difficult to recover in this setting. The reason lies in the fact that any 
enclosure will necessarily enlarge the set of possible trajectories, even if we apply the 
same state space constraints to F and F±. In the presence of state space constraints 
this means that for any i there may exist a trajectory Xj of Fi for which all nearby 
trajectories x of F violate the space constraints. In other words, unless very specific 
knowledge about the dynamics F is available and used for the construction of the 
enclosure Fi, the enlargement of the dynamics has the implicit effect of relaxing the 
state space constraints. 

However, if we assume that the optimal value function is continuous with respect 
to relaxations of the state space constraints, then we can recover Proposition [2] In 
order to formalize this relaxation, for e > we define the space 

X E := {x G X\d(x,X) < e}, 

the multivalued dynamics 

F e (x, u, w) := F(x, u, w) fi X £ 

and the related optimal value function V(f s ,g)- Using this notation we can prove the 
following variant of Proposition [2j 

Proposition 3. Consider the state space constrained dynamics F of F and consider 
a sequence of enclosures (Fi,Gi) of F on X. Let the assumptions of Proposition^ 
hold for F and F i; where |7p in the case of F(x,u,w) = is to be understood as 

Fi(x, u,w) = for all i G N and all x, u, w with F(x, u, w) = 0. 

Assume, furthermore, that F is upper semi-continuous in x uniformly in u and w 
on compact subsets of X and let \\ ■ \\ p be the usual p -norm for real valued functions 
on X for some p G {1, . . . , 00}. 

Then for each compact set K C X for which sup^g^ V(f,g)(x) < 00 and on which 
the continuity assumption 

\\V(F e ,G)\K ~ V( F ,o)\k\\p -»• as s ->0 (10) 

holds, we have 

\\V(Fi,Gi)\K ~ V( F ,g)\k\\p ^0 as i -> 00. 
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Proof. The assumptions on F and Fi imply that for each e > 0, each k* £ N and 
each sufficiently large i £ N, for each trajectory x, of Fi we can find a trajectory x £ 
of F with ||x| — a:fe|| < e, A; = 0, . . . , k*. Hence, up to the time k* the trajectory 
x e is also a trajectory of F e . Thus, replacing F by F £ we can follow the proof of 
Proposition [2] in order to obtain 

V( Fe ,G)(x) < V {FuGi) (x) + 5e 



for all sufficiently large i £ N and all x £ K . Now ( 10 ) implies the assertion 



□ 



Remark 2. Basically, the continuity assumption (10) demands that an arbitrarily 



small relaxation of the state space constraints does not lead to large changes in the 



optimal value function. IfVtF,G) is continuous on K then one can expect (10) to hold 
for p = oo while if V(f,g) is discontinuous on K (note that state space restrictions 
may introduce discontinuities in the optimal value function) then we would only 



expect (10) to hold with p < oo because the location of the discontinuity is likely to 



change when the state constraint changes. We conjecture that (10) holds under mild 
regularity conditions on the optimal control problem, a formal verification, however, 
is beyond the scope of this paper. 

In any case, we would like to emphasize that our result allows for a rigorous 
convergence proof of the approximating multivalued game in the presence of discon- 
tinuities, a feature which is rarely found in other approximation techniques. 



5 Discretization of the game 

In this section we describe the set oriented discretization technique which transforms 
our problem into a graph theoretic problem. In order to introduce our method, we 
first recall the corresponding procedure for unperturbed systems developed in [1] 
before we turn to the general setting. 

5.1 Discretizing the Unperturbed System 

If X is finite and there are no perturbations, then one can use a shortest path 
algorithm like Dijkstra's method [10], see also the appendix, in order to compute 
the value function, see, e.g., [7]. In [1] it has been shown how to discretize general 
optimal control problems with continuous state space such that this approach can be 
applied. We review this method here in a different formulation that directly carries 
over to the case of a perturbed control system in the next section. 

We consider a single valued control system / : XxU — > X (/ continuous, X C M d 
and U C M. m compact, £ X, £ U, /(0, 0) =0), together with a continuous cost 
function g : X x U — > [0, oo) with g(x, u) > for x ^ and g(0, 0) = 0. Let V be a 
finite partition of X, i.e. V is a finite set of mutually disjoint subsets P C X. Define 
the map 7r : X — > V, 7r(x) = P, x £ P, as well as p : X =} X, p = n^ 1 o n (i.e. to 
each x, p associates the set of the partition V which contains x). 
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Box-enclosure of the system. Consider the multivalued game (which is actually 
a multivalued control system since there are no perturbations here) (F, G) with 

F(x,u,w) — F(x,u) :— p(f(x,u)) and G(x, Xx, u, w) — g(x, u). 

The optimality principle (|6]) in this case reads 



V( F ,G)( X ) = in f r { 9(x, u) + inf V {FiG )(xi) }. (11) 

u&U I xiEF(x,u) 



Projection onto piecewise constant functions. The right hand side of (11 ) de- 
fines an operator on real valued functions on X, the dynamic programming operator 
L : R x — > M x , 



L[f](x) = inf < g(x,u) + inf v(x\) 

ueU [ xxEF(x,u) 

Note that the optimal value function V(p,G) i s ; by definition of L, a fixed point of 
L, i.e. L[V(i?Go] = V(f,g)- Abusing notation, we identify the space M. v with the 
subspace of real valued functions on X that are piecewise constant on the elements 
of the partition V (in fact, we view v G as the function v o n e M. x ). We define 
the projection y? : R x -> R v C R x , 

<f[v](x) = inf v(x'), 

x'£p(x) 

and the corresponding discretized dynamic programming operator L-p : R v M^, 

L-p = (p o L. 
Explicitely, the discretized operator reads 

Lp[v](x) = inf \ inf \ g(x\ u) + inf v{x x ) \ \ 
= , pf {g(x',u)+v(f(x',u))}, 

since v G is constant on each element of V, i.e. on each set F(x',u). 

We define the discretized optimal value function Vp G R v as the unique fixed 
point of Lp with Vp(0) = 0. Then Vp satisfies the optimality principle 

Vp(x)= inf {g(x',u) + Vp(f(x',u))}. (12) 

x £p(x),u£U 



Graph theoretic formulation. Note that since V is finite, Vp(f(x',u)) in (12) 
can only take finitely many values. We can therefore rewrite ((121) as 



Vp(x) = min inf {g(x',u) + Vp(P)\ (13) 

Pen(f{p(x),U)) x'£p(x),u&U:f{x',u)&P 

where V-p(P) = Vp(x) for any x G P G V . If we define the multivalued map (or, 
equivalently, the directed graph) T : V =4 V, 

F(P) = n(f(n-\P),U)), PEV, (14) 
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and the cost function 

G(P', P) = m£{g(x, u)\xe P', f(x, u) G P,u G U}, (15) 



we can rewrite (13) as 

V V (P)= min {g(P,P 1 ) + V v (P 1 )}. 

Note that this optimality principle can be interpreted as being solved by Dijkstra's 
algorithm. 



5.2 Discretization of the Perturbed System 

Now we want to carry over the discretization procedure from the last section to our 
game setting. We proceed in a completely analogous way, additionally incorporating 
the perturbations now. This will ultimately lead to a directed hypergraph (actually 
a forward hypergraph or F- graph in the terminology of [TT]) instead of an ordinary 
graph for which we formulate the associated shortest path algorithm at the end of 
the section. 



Box-enclosure of the system. Consider the multivalued game (F, G) with 

F(x,u,w) — p(f(x,u,w)) and G(x, x%, u, w) — g(x, u), (16) 

(where / and g are the control system and cost function introduced in Section [2]). 
From the optimality principle ^ we obtain 

V(f,g)(x) = inf sup inf {g(x, u) + V (F>G) (xi)} 

ueu W £W xi&F(x,u,w) 



inf <^ g(x,u) + sup inf V( F ,G)(xi) > 

nSC w£W xi£F(x,u,w) J 



Projection onto piecewise constant functions. The dynamic programming 
operator L : IR X — > M. x here reads 



L[v](x) = inf < g(x,u) + sup inf v(x\) 



Correspondingly, the discretized operator L-p : R' — > W is given by 

Lp[v](x) = inf < inf < g(x',u) + sup inf v(x\) > > 

x'Gp(x) [ueU [ w&W xiGF(x',u,w) J J 

= inf <g(x',u)+ sup v(xi) 
x'ep(x),ueu I x 1 eF(x',u,w) 

since v G is constant on each element of V, i.e. on each set F(x', u, w). 

We define the discretized optimal value function Vp G M. v as the unique fixed 



point of L-p with Vp(P) = for all partition elements P G ? with tt^ 1 (P) H O ^ 
Then Vp satisfies the optimality principle 

Vp(x) = inf <g(x\u)+ sup Vp{x{)\ . (17) 

x'£p(x),ueU 1 ' xieF(x',u,W) 
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Graph theoretic formulation. In order to derive the corresponding shortest 



path algorithm, it is useful to formulate (17) equivalently in terms of an associated 



graph. To this end note that for any pair (x, u) E X xU, the set F(x, u, W) C X is 
the union of a finite set of elements from the partition V. In particular, the family 
{F(x', u, W) : (x f , u) E p(x) x U} of subsets of X is finite for any x E X . Putting this 
in terms of a corresponding map on V: each partition element P is mapped to a finite 
family {Ni}i=i J ... l %(p), Mi C V, of subsets of V under all perturbations. Formally, we 
have a directed hypergraph (V, E) with the set E C V x 2 V of hyperedges given by 

E = {(P,AT) | ir(F(x,u,W)) = M for some (x,u) E P x U} , 

or, equivalently, the multivalued map T : V =^ 2 V , 

F(P) = {tt(F(x, u, W)) :(x,u)EPx U}, 

c.f. Figure [l] 



ir(f(x,u,W)) F{x,u,W) 



f(x,u,W) 




Figure 1: Illustration of the construction of the hypergraph. 



If we define weights on the edges of this hypergraph by 

g(P,Af) = mi{g(x,u) : (x,u) E P x U, tt(F(x, u, W)) = M}, 



then we can write (17) equivalently as 



V V (P) = inf <^ G(P,Af) + sup V P (N) 

Dijkstra's method for the perturbed system. We are now going to generalize 
Dijkstra's algorithm (see the appendix) such that it computes the value function of 
a weighted directed hypergraph (i.e. the function defined by the optimality principle 



Let (V,E), E C V x 2 V , be a hypergraph with weights Q : E — » [0, oo). In 
order to adapt Algorithm |2j we need to modify the relaxing step in lines 7-9, such 



that the maximization over all perturbations (i.e. over iV £ Af) in (18) is taken into 
account. The modified version of lines 7-9 reads: 
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7 

8 
9 



for each (Q,M) G E with P G M 

if V(Q) > Q{Q,M) + raax NeM V(N) then 
V{Q) := G(Q,Af) + max NeN - V(N) 



As justified by Proposition [5] (see the Appendix), if J\f C V\Q, then 




NeAf 



and the node Q will never be relaxed again. On the other hand, if M (£. V\Q, then 
Q will be relaxed at a later time again and we do not need to relax it in this iteration 
of the while-loop. These considerations lead to the following further modification of 
lines 7-9: 



Including the adapted initialization, the overall algorithm for the case of a per- 
turbed system reads as follows. Here, T> C V is the set of destination nodes which 
typically will be chosen as T> = {P G V : P H O ^ 0} (with the robust forward 
invariant set O from Section [2]) . 

Algorithm 1. Perturbed Dijkstra((:P, E),Q,V) 

1 for each P E V set V(P) := oo 

2 for each P eV set V{P) := 

3 Q:=V 

4 while Q ^ 

5 P := argmin P , g Q V(P') 

6 Q:= Q\{P} 

7 for each (Q,J\f) G E with P G M 

8 ifMdV\Qthen 

9 ifV(Q) > G{Q,M) + V(P) then 

10 V(Q) :=g{Q,M)+V{P) 

We note that this algorithm bears similarities with the SBT-algorithm in [TT_J. 
However, in our case the graph has a special structure (namely, the heads of the 
hyperedges consist of only a single node, i.e. we have an F-graph as defined in [TTj). 
This yields the subquadratic complexity in the number of nodes as derived above 
and thus gives an improvement over SBT. 

6 Implementation and Numerical Examples 
6.1 Implementation 

In the numerical realization we always let the state space X be a box in M, d and 
construct a partition V of it by dividing X uniformly into smaller boxes. In fact, 



7 
8 
9 
10 



for each (Q,Af) G E with P G M 
if N C V\Q then 

if V(Q) > Q{QM) + V(P) then 

v(Q) -.= g(Q,Ar) + v(p) 
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we realize this division by repeatedly bisecting the current division (changing the 
coordinate direction after each bisection). The resulting sequence of partitions can 
efficiently be stored as a binary tree — see [12] for more details. 

In order to compute (or rather approximate) the set E C V x 1 V of hyperedges, 
we choose finite sets P C P, U C U and W C W of test points - typically on an 
equidistant grid in each of these sets. We then compute 

F(P) ■= {tt(F(x, u, W)) : (x, u) e P x U} C 2 V 

as an approximation to T{P) and correspondingly approximate the weights on the 
hyperedges by 

g(P t J\f) = min{g(x, u) : (x, u) E P x U, ir{F(x, u, W)) = J\f}. 

Time and space complexity. The time complexity of the standard Dijkstra 
algorithm (Algorithm [2] in the appendix) strongly depends on the data structure 
which is used in order to store the set Q. In particular, the complexity of the oper- 
ations in lines 5 (extracting the node with minimal V^-value) and line 9 (decreasing 
the V-value and the associated reorganization of the data structure) have a crucial 
influence. In our implementation we are using a binary heap in order to store Q 
which leads to a complexity of C((|'P| + \E\) log \V\). 

In the perturbed case (Algorithm [TJ , each hyperedge is considered at most iV 
times in line 7, with N being a bound on the cardinality of the hypernodes M . 
Additionally, we need to perform the check in line 8, which has linear complex- 
ity in N. Thus, the overall complexity of the perturbed Dijkstra algorithm is 
log \V\ + \E\N(N + log \V\)). 

The space requirements grow linearly with the number of partition elements. 
Since typically the whole state space has to be covered, this number grows expo- 
nentially with the dimension of phase space (assuming a uniform partioning). The 
concrete storage consumption strongly depends on the properties of the underlying 
control system. While the number of hyperedges is essentially determined by the 
Lipschitz constant of /, the size of the hypernodes M will crucially be influenced by 
the size of the perturbation. In the applications that we have in mind in this paper, 
these numbers are of moderate size. 

As a rule of thumb, the main computational effort in our approach goes into the 
construction of the hypergraph via the mapping of test points - in particular, if the 
system is given by a short-time integration of a continuous time system. Note that 
this "sampling" of the system will be required in any method that computes the 
value function. Typically however, in standard methods like value iteration, certain 
points are sampled multiple times which leads to a higher computational effort in 
comparison to our approach. 

6.2 Numerical Examples 

A simple ID system. We start by looking at an additively perturbed version of 
a simple ID map from 

%k+i = x k + (1 - a)u k x k + w k , k = 0, 1, ... , 
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with Xk G [0, 1], Uk G [—1, 1], Wk G [— e, e] for some e > and the fixed parameter 
a G (0, 1). The cost function is 



g(x, u) — (1 — a)a; 

so that (regardless of how the perturbation sequence is chosen) the optimal control 
policy is to steer to the origin as fast as possible, i.e. to choose u k = —1 for all 
k. Similarly, the optimal strategy for the "perturbing player" is to slow down the 
dynamics as much as possible, corresponding to w k = e for all k. The resulting 
dynamical system is the affine linear map 



x k+ i = ax k + e, k = 0, 1 



which has a fixed point at x = e/(l — a), i.e. under worst case conditions (assuming 
Wk = e for all k) it will be impossible to get any closer than ao ■= e / (1 — a) to the 
origin. Correspondingly, we choose a neighborhood O = [0, a] with a > ao as our 
target region. With 



k(x) - 

the exact optimal value function is 



log?: 



"0 



log a 



+ 1, 



V(x) = (x- ao) (1 - a k ^) + ek(x), 

as shown in Figure [2] for a = 0.8, e = 0.01 and a = l.lao- In that Figure, we 
also show the approximate optimal value functions on partitions of 64, 256 and 
1024 intervals, respectively. In the construction of the hypergraph, we used an 
equidistant grid of ten points in each partition interval, in the control space and in 
the perturbation space. 

The inverted pendulum — reloaded. As a more challenging test case, we re- 
consider the problem of designing an optimal globally stabilizing controller for an 
inverted pendulum on a cart (see [H 12]): 



— Tn r cos (p 1 (p H — m r (i) sin 2o? — — sin ld = — u — -cosoj. (19) 
3 J 2 t ml 

The equation models the (planar) motion of an inverted pendulum with mass m = 2 
on a cart with mass M = 8 which moves under an applied horizontal force u. The 
angle ip measures the offset angle from the vertical up position. The parameter 
m r = m/(m + M) is the mass ratio and I = 0.5 the distance of the pendulum mass 
from the pivot. We use g = 9.8 for the gravitational constant. The instantaneous 
cost is 

q(ip, <p,u) = - (O.lif 2 + O.O50 2 + 0.01m 2 ) . (20) 



Denoting the evolution operator of the control system (|19j) for constant control 
functions u by u), we consider the time-T-map $ T (x,u) of this system as our 
discrete time system with T = 0.1. The map $ T is approximated via the classical 
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x 

Figure 2: Perturbed simple ID map: Upper value function and its approximations 
on various partitions. 

Runge-Kutta scheme of order 4 with step size 0.02. Thus we arrive at the cost 
function 

g(f,if,u)= j g($*((p,y>),u),u) dt, 
Jo 

We choose X = [—8, 8] x [—10, 10] as the region of interest. 

In [2], a feedback trajectory with initial value (3.1,0.1) was computed that was 
based on an approximate optimal value function on a partition of 2 18 boxes (cf. 
Figure [3] (left)). In contrast to what one might expect, the approximate optimal 
value function does actually not decrease monotonically along this trajectory (cf. 
Figure [3] (right)). This effect is due to the fact that the discretization method 
used in [2] allows for jumps in the trajectories which cannot be reproduced by the 
real system. The fact that the approximate optimal value function is not always 
decreasing indicates that the approximation accuracy in this example is just fine 
enough to allow for stabilization, and in fact, on a coarser partition of 2 14 boxes, 
the associated feedback is not stabilizing this initial condition any more. 

We are now going to use the approach developed in this paper in order to design 
a stabilizing feedback controller on basis of the coarser partition (2 14 boxes). To this 
end, we imagine the perturbation of our system being given as "for a given state 
(if, ip), be prepared to start anywhere in the box that contains (if, if)" , i.e. we define 
our game by 

F((f,if),u,W):=® T (B,u), 

where B G V is the box in the partition V under consideration which contains the 
point (if, if). Note that we do not need to parameterize the points in Q T (B, u) with 
w G W for the construction of the hypergraph. 

Figure [|] shows the approximate upper value function on a partition of 2 14 boxes 
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Figure 3: Approximate optimal value function and feedback trajectory (left) and 
the approximate optimal value function along the feedback trajectory (right) for the 
inverted pendulum on a 2 18 box partition. 
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Figure 4: Approximate upper value function and feedback trajectory (left) and 
the approximate upper value function along the feedback trajectory (right) for the 
inverted pendulum on a 2 14 box partition using the robust feedback construction. 

with target region O = [— 0.1,0. 1] 2 as well as the trajectory generated by the asso- 
ciated feedback for the initial value (3.1,0.1). As expected, the approximate value 
function is decreasing monotonically along this trajectory. Furthermore, despite the 
fact that we used considerably fewer boxes as for Figure [3j the resulting trajectory 
is obviously closer to the optimal one because it converges to the origin much faster. 

7 Convergence Analysis 

In this section we show that and in which sense the approximate optimal value 
function constructed in the preceeding section converges to the true one as the 
underlying partitions are refined, using the abstract results for multivalued games 
developed in the Sections [3] and |4j 

We begin with the following observation on the relation between Vp and V(f,g) 
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with F, G from Q. 

Proposition 4. Consider the discretized optimal value function Vp and the optimal 
value function V(f,g) from |5p corresponding to the game (16). IfV(p,G) is continuous 
on dO, then these functions are related by 

Vp(x) = inf V( F> g)(x'). 
x'ep(x) 

Proof: First note that both functions are nonnegative. ^From the previous 
considerations it follows that the functions satisfy the optimality principles 

V(f,g) 0*0 = inf sup inf {g(x, u) + V( F ,G\(xi)} (21) 

u£U xi£F(x,u,w) 



and 



Vp(x) = inf inf sup inf {g(x', u) + Vp(x\)} . (22) 

x'Gp(x) u£U W £\y x-i£F(x' ,u,w) 

In order to show 

inf V {F: g)(x')<V p (x), (23) 

x'£p(x) 

we number the elements Pi of V such that 12 > i\ implies Vv\p %2 > Vv\Pi ■ We first 
consider those elements Pi, i = 1, . . . , j, for which we have Vp\p i = which by our 
assumptions on Vp and g(x,u) is equivalent to 7r _1 (P i ) fl O 7^ 0. 

In case that 7r _1 (Pj) fl O 7^ 0, we can find x G 7r _1 (Pj) fl O and u G U such 
that F(xq,Uq,w) C O for all w G W. In particular, for any fixed w we find X\ G 
F(xo, uo, w) HO for which we proceed the same way, which yields F(xi, ui, w) C O 
for all w G W. Hence, given a perturbation strategy /3(u) we find a control sequence 
u such that X F (x , u, /3(u)) C O implying 

00 

</(f,g)(zo,u,/3(u)) = inf V G(x fc , x fc +i, Wfc, /3(u) fc ) = 

fc=0 

and thus 

inf V( F .g)(^') < V(F,G)(^o) = < Vp(s ), 
x'ep(xo) 



which shows (23) for p(x) = Pi with 7r _1 (Pj) fl O 7^ 0. In fact, what we showed is 
that V( F g)(x) = for x G O. Since we assumed that V^g) is continuous on dO, we 
also get 



inf V (FjG) (x) = 



for P with TT-^Pi) n O ^ 0, but ^(P) n O = 0. 

Now we proceed by induction over i > j + 1. We pick some z > j + 1 and assume 
that the desired inequality (23) holds for p(x) = Pi, . . . , P_i. We fix x G X with 
= Pi and an arbitrary e > 0. Then we pick x" G Pj such that the infimum over 
x' in (22) is attained up to e. Thus we obtain 

Vp(x) = inf inf sup inf {g(x ,u) + Vp(xi)} 
> inf sup inf {g(x", u) + Vp(xi)} — e 

u£U xiGF(x",u,w) 

= %kz*J$,uM {9{x " ,u) + v ^ {xi) } - e 

= V {FjG) (x")-e > inf V {F)G) {x') - e, 
x'ePi 
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where we have used the induction assumption in the third step as follows: the 
inequality g(x,u) > implies Vp(x\) < Vp(x) = Vp\p t , furthermore we have X\ G 
F(x",u,w) = for some i' G N, i.e., Vp{x\) = Vp\p.,. This implies Vp\p. > Vp\p., 
and consequently i > i'. Hence by the induction assumption we have 

inf Vp(xi) = Vp\ P = inf VfoGjO&i). 

XiEF(x",u,w) x\£lF(x" ,u,w) 



Now, since e > was arbitrary, we obtain (23) 



The converse inequality Vp(x) < mf x ' ep M V^p,G){x) follows by a similar induction 



argument using the fact that (21) always yields a larger value than (22) due to the 



additional minimization over x' in (22). □ 



Remark 3. Note that in order to obtain the assertion from the preceeding proposi- 
tion, it is sufficient that the union of those partition elements that have nonempty 
intersection with O form a neighborhood of O . If this is true, one can actually drop 
the assumption on the continuity ofVrp^G) on dO. 

We now consider a sequence of increasingly finer partitions of X and ask under 
which conditions the corresponding approximate optimal value functions converge 
to the value function of the game {f,g)- In a nested sequence of partitions, each 
element of a partition is contained in an element of the preceding partition. 

The following theorem states our main convergence result. It shows that we 
obtain L°° convergence on compact sets on which V(/ jS ) is continuous and — under 
a mild regularity condition on the set of discontinuities — L 1 convergence on every 
compact set on which V(/ i9 ) is bounded. We first consider problems without state 
space constraints and address the constrained case in Remark |4| below. 

Theorem 1. Let (Vi)i^ be a nested sequence of partitions of X such that 

sup H(pi(x), {x}) ^0 as i — > oo. 

x£X 

Assume that g(x,u) is continuous, that g(x,u) > for x G" O and that V(/ j5 ) is 
continuous on dO. Then 

WviWi ~ %, g )|idloo — >■ as i —> oo 
for every compact set K C X on which Vtf^ is continuous and 

Ki= (J 7r_1 ( p ) 

PeVi,n~ 1 (P)cK 

being the largest subset of K which is a union of partition elements P G TV 

If we assume furthermore that the set of discontinuities ofV^ t9 ) has zero Lebesgue 
measure, then 

WviW - Vy^lidU 1 as i oo 
on every compact set K C X with sup xgE - V(f, g )(x) < oo. 
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Proof. We use Proposition [2] with (F, G) = (/, g) (f interpreted as a set valued map) 
and Proposition |4| 

Note that since Fi(x,u,w) = pi(f(x,u,w)) and Gi(x,u,w) = g(x,u), the games 
(Fi,Gi) are enclosures of (/, g) (in fact, since the sequence of partitions is nested, 
for every i, (F^Gi) is an enclosure of (F i+ i,G i+ x)). Under the assumptions of 
the theorem, all assumptions of Proposition [2] are satisfied. In particular, by the 
assumptions on g and since X and U are compact, we know that there exists a 
function a G /Coo such that 

Gi(x, xi, u, w) = g(x, u) > a(d(x, O) + d(x%, O)) 

for all i. Thus, Vrp^aA converges uniformly to V(j t g\ on K. In order to show the L°° 
convergence on observe that if Vrf g ) is continuous on K then it is also uniformly 
continuous on K which implies 

sup I inf V(/ iS ) (x) — sup V(/ )S ) (x) \ ^ 

as i — > oo. Thus we can use Proposition [4] in order to conclude 

II V Pt \ Ki - V iu) | Ki || oo < sup \V Vi \ P - sup V (/iS) (x) | 

Pe'Pi,7r- 1 (P)cii: z£P 



sup | inf V {Fi , G A (y) - sup (x) | 
Pe7'i,7r- 1 (P)ci<' ^ ejP zeP 

< sup { I inf V( Fi , Gt ) (j/) - inf V (/jS) (x) | 

+ I - sup % g) (x)|} -> 

xeP ^.(=p J 



xeP 
as z — ► oo. 

In order to show the L 1 convergence, observe that the uniform convergence 
v (Fi,GA -» on K implies 

\\V{ F% ,ga\k ~ V(f jg )\ K \\ L i -> as z -> oo. 

It thus remains to show that V^g^Ik — Vp^K — + in L 1 . Let .D be the set of 
discontinuities of V (/i9) and V { = {P e Vi, P D D ^ 0}. We write 



with 



/ V {FhGi) - V P . dm = I iA + I it2 
Jk 

d^t, J PnK 



(24) 



Pen 



^ = 2^ L J{FuGA-V Vi dm. (25) 



PnK 



Because of Vtf^ > V^f^ga, ^ ne assumption that D has zero Lebesgue measure and 
H(pi(x),{x}) — ► 0, we have that Jj! — > for z — ► oo. Using Proposition [4j the 
compactness of K, and the fact that V^f uGi )\k — ¥ V{f, g )\K uniformly, we also obtain 
that Ii t 2 — > as i — > oo, i.e. V^c^lic — VpJ^- — > in L 1 and thus the assertion of 
the theorem. □ 



21 



Corollary 1. Under the assumptions of Theorem^J\we have 

Vpi(x) — ► V(f >g )(x) as i ^ oo 
for Lebesgue-almost all x G K , where K is any compact subset of the domain of 

Proof. By standard arguments, there exists a subsequence such that Vp. 



to 



,x 



V(f t g)(x) as j — > oo for Lebesgue-almost all x & K. Since (Vp^x))^ is monotone, we 
obtain the assertion. □ 

Remark 4. Using Proposition [3] instead of Proposition z't z's easily seen that our 
convergence results remain valid in case of state space constraints if we assume 
condition (10) for F(x,u,w) = {f(x,u,w)}. In this case, the first assertion of 
Theorem [i| will hold for the p-norm from |TZ| ) instead of the oo-norm. 



8 Feedback Construction 

As usual, we use the approximate optimal value function Vp and the optimality 
principle ^ in order to construct an approximate optimal feedback. More precisely, 
for any point x G So, So '■— {x G X : V(f t9 )(x) < oo}, we define 

u v (x) = argmin ue[7 max {g(x,u) + V v (f(x,u,w))}. 

We can immediately adapt Theorem 3 from [2] in order to obtain a statement about 
the performance of this feedback. The following result in particular shows that the 
feedback is robust with respect to arbitrary perturbations of the system. 

Theorem 2. Let the assumptions of Theorem [I] be satisfied. Let D G So be an 

open set with compact closure, such that D C So, O C D and on which V(f i9 \ is 
continuous. Let c > be such that the inclusion D c {io) := V^ 1 ([0, c]) C D holds for 
some z'o G N. Then there exists a function 5 : R — > R with lim^o 5(a) = such 
that for all sufficiently small e, all sufficiently large i, all rj G (0, 1), all xq G D c (i) 
and all perturbation sequences (wk)k G W N , the trajectory generated by 

Xk+i = f(x k ,u Pi (x k ),w k ) 

satisfies 

{fc-i 
V(x ) - (1 - rj) ^g(xj,u Vi (xj)),5(e/r]) +e 
3=0 

Proof. We only point out how to suitably modify the proof of Theorem 3 in [2]. 
First note that according to Theorem [TJ Vp i converges uniformly to V(/ )9 ) on D. The 
second observation is that if we choose i\ G N, %\ > io such that V(f >g ) — Vp^x) < e/2 
for i >i\ and all x G D c (ii), then 

Vp^x) + e/2 > V(x) = inf sup {g(x, u) + V(f(x, u, w))} 

ueu w( z W 

> minmax{(yf(x, u) + Vp i (f(x, u, w))} 

= g(x,u Vi (x)) +maxVpXf(x,upXx),w)), 
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i.e. 

V Ti (x k+1 ) < V Vi {x k ) - g(x,u Ti (x)) +e/2 

for all Xfc+i G f(x k , Up^x), W). The rest of the proof of Theorem 3 in [2] remains 
the same. □ 

Remark 5. A particular application of our result is to robustify the feedback con- 
struction from [2] with respect to small perturbations which may be due, e.g., to 
discretization errors resulting from the numerical computation of the discrete time 
system from an ordinary differential equation. For this purpose, a particularly con- 
venient way is to consider an "e-inflated" system related to the original unperturbed 
system. More precisely, given an unperturbed control system f : X x U — > X, one 
considers the perturbed system 

Xk+i = f(x k ,u k ) + ew k , A; = 0,1,..., 

with w k G [— 1,1] for some (small) e > 0. In the numerical realization, the sets 
F(x,u, W) = f(x,u) + e[—l,l] d are easy to construct using ideas from rigorous 
discretization, see ITS] [7^ - 



A Dijkstra's Method 

Let (V,E) be a finite directed graph with edge weights g : E — > [0, oo). Let 
D G V be the destination node. The following algorithm [10] computes the length 
V(P) G [0, oo) of the shortest path from P to D for all nodes P G V . 

Algorithm 2. Dijkstra(("P, E), g, D) 

1 for each P EV set V(P) := oo 

2 V(D):=0 

3 Q:=V 

4 while Q ^ 

5 P := argmin P , g Q V(P') 

6 Q := Q\{P} 

7 for each Q eP with (Q, P) G E 

8 if V(Q) > g(Q, P) + V(P) then 

9 V(Q):=g{Q,P) + V{P) 

An important feature of this algorithm is given by the following proposition, 
which follows immediately from the construction of the algorithm and the fact that 
the edge weights are nonnegative. 

Proposition 5. During the while-loop in lines 4-9 of Algorithm^ it holds that 

V(P) > V(P') for all P' G V\Q. 
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